Estimating and Clustering Curves in the Presence of Heteroscedastic Errors
نویسنده
چکیده
The technique introduced in this paper is a means for estimating and discovering underlying patterns for a large number of curves observed with heteroscedastic errors. Therefore, both the mean and the variance functions of each curve are assumed unknown and varying over time. The method consists of a series of steps. We transform using an orthonormal basis of functions in L2. In the transform domain, the nonparametric regression is reduced to a means model. To estimate the means in the transform domain, we consider the class of linear or modulation estimators and proceed as in Beran and Dümbgen (1998) by minimizing the Stein’s unbiased risk estimate. By minimizing the risk over a nested subset selection of modulators, we reduce the dimensionality of the means space. We show that in the transform space, the risk estimate is asymptotically optimal in the Pinsker’s minimax sense over Sobolev ellipsoids under heteroscedastic errors. Coefficient estimation and dimensionality reduction via optimal risk estimation is essential for accurate clustering membership estimation. We illustrate our technique by estimating and clustering a large number of curves both within a synthetic example and within a specific application. In this application, we analyze the research and development expenditure of a subset of companies in the Compustat Global database. We show that our method compares favorably to two alternative approaches.
منابع مشابه
Clustering Curves in the Presence of Heteroscedastic Errors
The clustering technique introduced in this paper is a means for discovering underlying patterns among a large number of curves. One novel characteristic compared to the current clustering methods is that we allow for heteroscedastic errors. Both the mean and the variance functions of each curve are assumed unknown and varying over time. The clustering method consists of a series of steps: tran...
متن کاملOn Presentation a new Estimator for Estimating of Population Mean in the Presence of Measurement error and non-Response
Introduction According to the classic sampling theory, errors that are mainly considered in the estimations are sampling errors. However, most non-sampling errors are more effective than sampling errors in properties of estimators. This has been confirmed by researchers over the past two decades, especially in relation to non-response errors that are one of the most fundamental non-immolation...
متن کاملA robust wavelet based profile monitoring and change point detection using S-estimator and clustering
Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...
متن کاملWavelet designs for estimating nonparametric curves with heteroscedastic error 3
In this paper, we discuss the problem of constructing designs in order to maximize the accuracy 9 of nonparametric curve estimation in the possible presence of heteroscedastic errors. Our approach is to exploit the 3exibility of wavelet approximations to approximate the unknown response 11 curve by its wavelet expansion thereby eliminating the mathematical di5culty associated with the unknown s...
متن کاملA New Method for Duplicate Detection Using Hierarchical Clustering of Records
Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...
متن کامل